{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyOjHmqyDH2AifcjLWo5MQHm",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/LC1332/personality-text-generation/blob/main/lulu_exp/%E5%BF%83%E7%90%86%E9%97%AE%E7%AD%94%E5%AE%A1%E6%A0%B8.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "V3akZ0GcY-qE"
      },
      "outputs": [],
      "source": [
        "long_name = {\n",
        "    'N':'neuroticism',\n",
        "    'E':'extraversion',\n",
        "    'O':'openness',\n",
        "    'A':'agreeableness',\n",
        "    'C':'conscientiousness'\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "input_name = 'questions.jsonl'\n",
        "\n",
        "这个文件每行是一个json格式\n",
        "\n",
        "请为我读取到python中，存到一个叫做datas的list中"
      ],
      "metadata": {
        "id": "ZUyG63qjZ6-a"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "input_name = 'questions.jsonl'\n",
        "\n",
        "import json\n",
        "\n",
        "datas = []\n",
        "\n",
        "with open(input_name, encoding='utf-8') as f:\n",
        "    for line in f:\n",
        "        data = json.loads(line)\n",
        "        datas.append(data)"
      ],
      "metadata": {
        "id": "M1oq54G6Z4vL"
      },
      "execution_count": 7,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "print(datas[1])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lK9mk2HsaSY1",
        "outputId": "572b4e6a-a5d3-4737-9a34-7df273f613ed"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{'id': 1, 'question': '我真的喜欢大部份我遇见的人。', 'factor': 'extraversion', 'if_pos': 'true'}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install -q langchain tiktoken"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "nEcWbWC2dBm_",
        "outputId": "3b83bf26-5c0e-40d0-d039-61c82fafa6c9"
      },
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m20.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m53.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install -q openai"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2T6C4SGcdsdV",
        "outputId": "3c1b99a4-aaf6-42dc-c05d-b19cdaf1d0da"
      },
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[?25l     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/76.5 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K     \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━━━━━━━\u001b[0m \u001b[32m61.4/76.5 kB\u001b[0m \u001b[31m2.0 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.5/76.5 kB\u001b[0m \u001b[31m1.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#@title 大五人格的分factor细节定义\n",
        "\n",
        "# big5_to_prob_id = {\n",
        "#     'neuroticism': [1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56],\n",
        "#     'extraversion': [2, 7, 12, 18, 22, 27, 32, 37, 42, 47, 52, 57],\n",
        "#     'openness': [3, 8, 13, 17, 23, 28, 33, 38, 43, 48, 53, 58],\n",
        "#     'agreeableness': [4, 9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59],\n",
        "#     'conscientiousness': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60]\n",
        "# }\n",
        "\n",
        "big5_to_detail = {}\n",
        "\n",
        "big5_to_detail['conscientiousness'] = \"\"\"Conscientiousness refers to the way we control, regulate, and direct our impulses. It assesses organization, persistence, and motivation in goal-directed behavior. It contrasts dependable, disciplined individuals with those who are lackadaisical and disorganized. Conscientiousness also reflects the level of self-control and the ability to delay gratification. Impulsiveness is not necessarily bad, sometimes the environment requires quick decision-making. Impulsive individuals are often seen as fun, interesting companions. However, impulsive behavior often gets people into trouble, providing momentary gratification at the expense of long-term negative consequences, such as aggression or substance abuse. Impulsive people generally do not accomplish major achievements. Conscientious people more easily avoid trouble and achieve greater success. They are generally seen as intelligent and reliable, although highly conscientious people may be perfectionists or workaholics. Extremely prudent individuals can seem monotonous, dull, and lifeless.\n",
        "\n",
        "Conscientiousness can be divided into six facets:\n",
        "\n",
        "C1 COMPETENCE\n",
        "\n",
        "Refers to the sense that one is capable, sensible, prudent, and effective. High scorers feel well-prepared to deal with life. Low scorers have a lower opinion of their abilities, admitting that they are often unprepared and inept.\n",
        "\n",
        "High scorers: Confident in own abilities. Efficient, thorough, confident, intelligent.\n",
        "\n",
        "Low scorers: Lack confidence in own abilities, do not feel in control of work and life. Confused, forgetful, foolish.\n",
        "\n",
        "C2 ORDER\n",
        "\n",
        "High scorers are neat, tidy, well-organized, they put things in their proper places. Low scorers cannot organize things well, describe themselves as unmethodical.\n",
        "\n",
        "High scorers: Well-organized, like making plans and following them. Precise, efficient, methodical.\n",
        "\n",
        "Low scorers: Lack planning and orderliness, appear haphazard. Disorderly, impulsive, careless.\n",
        "\n",
        "C3 DUTIFULNESS\n",
        "\n",
        "To some extent, dutifulness refers to adherence to one's conscience, assessed by this facet. High scorers strictly follow their moral principles and scrupulously fulfill their moral obligations. Low scorers are more casual about such matters, somewhat unreliable or undependable.\n",
        "\n",
        "High scorers: Dutiful, follow the rules. Reliable, polite, organized, thorough.\n",
        "\n",
        "Low scorers: Feel restricted by rules and regulations. Often seen by others as unreliable, irresponsible. Careless, thoughtless, distracted.\n",
        "\n",
        "C4 ACHIEVEMENT STRIVING\n",
        "\n",
        "High scorers have high aspiration levels and work hard to achieve their goals. They are industrious, purposeful, and have a sense of direction. Low scorers are lackadaisical, even lazy, lacking motivation to succeed, having no ambitions and appearing to drift aimlessly. But they are often quite satisfied with their modest level of accomplishment.\n",
        "\n",
        "High scorers: Striving for success and excellence, often seen as workaholics. Ambitious, industrious, enterprising, persevering.\n",
        "\n",
        "Low scorers: Satisfied with completing basic tasks, seen as lazy by others. Leisurely, daydreaming, aimless.\n",
        "\n",
        "C5 SELF-DISCIPLINE\n",
        "\n",
        "Refers to the ability to begin tasks and follow through despite boredom or distractions. High scorers can motivate themselves to get the job done. Low scorers procrastinate in starting routine chores, easily losing confidence and giving up.\n",
        "\n",
        "Low scorers: Procrastinate in work, often do not finish tasks, easily discouraged by obstacles. Unambitious, forgetful, distracted.\n",
        "\n",
        "C6 DELIBERATION\n",
        "\n",
        "This facet pertains to the tendency to think carefully before acting. High scorers are cautious and deliberate. Low scorers are hasty and speak/act without considering consequences.\n",
        "\n",
        "High scorers: Think before acting, not impulsive. Prudent, logical, mature.\n",
        "\n",
        "Low scorers: Act without considering consequences, impulsive, speak thoughts as they occur. Immature, rash, impulsive, careless.\n",
        "\"\"\"\n",
        "\n",
        "big5_to_detail['openness'] = \"\"\"Openness describes a person's cognitive style. Openness to experience is defined as: the proactive seeking and appreciation of experience for its own sake, and tolerance for and exploration of the unfamiliar. This dimension contrasts intellectually curious, creative people open to novelty with traditional, down-to-earth, closed-minded individuals lacking artistic interests. Open people prefer abstract thinking, have wide interests. Closed people emphasize the concrete, conventional, are more traditional and conservative. Open people are suited to professions like teaching, closed people to occupations like police, sales, service.\n",
        "\n",
        "Openness can be divided into six facets:\n",
        "\n",
        "O1 FANTASY\n",
        "\n",
        "Open people have vivid imaginations and active fantasy lives. Their daydreams are not just escapes, but ways to create interesting inner worlds. They elaborate and flesh out their fantasies, and believe imagination is essential for a rich, creative life. Low scorers are more prosaic, keeping their minds on the task at hand.\n",
        "\n",
        "High scorers: Find the real world too plain and ordinary. Enjoy imagining, creating a more interesting, enriching world. Imaginative, daydreaming.\n",
        "\n",
        "Low scorers: Matter-of-fact, prefers real-world thinking. Practical, prefer concrete thought.\n",
        "\n",
        "O2 AESTHETICS\n",
        "\n",
        "High scorers have deep appreciation for art and beauty. They are moved by poetry, absorbed in music, and touched by art. They may not have artistic talent or refined taste, but most have strong interests that enrich their experience. Low scorers are relatively insensitive and indifferent to art and beauty.\n",
        "\n",
        "High scorers: Appreciate beauty in nature and the arts. Value aesthetic experiences, touched by art and beauty.\n",
        "\n",
        "Low scorers: Insensitive to beauty, disinterested in the arts. Insensitive to art, cannot understand it.\n",
        "\n",
        "O3 FEELINGS\n",
        "\n",
        "Refers to receptivity to one's own inner feelings, evaluating emotions as an important part of life. High scorers experience deeper emotional states and can differentiate among them, experiencing happiness and unhappiness more intensely than others. Low scorers have blunted affect and do not believe feelings are important.\n",
        "\n",
        "High scorers: Aware of own emotions/inner life. Sensitive, empathic, give importance to own feelings.\n",
        "\n",
        "Low scorers: Little access to own emotions/inner world, unwilling to articulate them. Narrow range of emotions, insensitive to context.\n",
        "\n",
        "O4 ACTIONS\n",
        "\n",
        "Openness is seen behaviorally as willingness to try different activities, visit new places, or sample exotic foods. High scorers prefer novelty and variety to familiarity and routine. Over time, one high scorer may have a wide array of interests. Low scorers find change difficult and prefer to stick with known activities.\n",
        "\n",
        "High scorers: Like experiencing new things, traveling, seeking different experiences. Find routine boring, willing to try new things. Seek novelty/variety, try new activities.\n",
        "\n",
        "Low scorers: Uncomfortable with unfamiliar, prefer familiar surroundings/people. Set in ways, like familiar things.\n",
        "\n",
        "O5 IDEAS\n",
        "\n",
        "Intellectual curiosity is an aspect of openness, manifested both in pursuit of intellectual interests for their own sake and in open-mindedness to new, unconventional ideas. High scorers enjoy philosophical arguments and brain teasers. Low scorers have narrower intellectual interests, and even if they are intelligent concentrate their abilities within limited areas.\n",
        "\n",
        "High scorers: Inquisitive, analytical, theoretical.\n",
        "\n",
        "Low scorers: Pragmatic, fact-oriented, unappreciative of mental challenges.\n",
        "\n",
        "O6 VALUES\n",
        "\n",
        "Openness to values means constantly reexamining social, political, and religious values. Closed people tend to accept authority, honor tradition, and as a result are conservative regardless of political affiliation.\n",
        "\n",
        "High scorers: Enjoy challenging authority, convention, traditional ideas. At the extreme, show hostility toward rules, sympathize with law-breakers. Tolerant, broad-minded, nonconforming.\n",
        "\n",
        "Low scorers: Prefer the stability and security of authority and convention, unwilling to challenge the status quo. Dogmatic, conservative, conforming.\n",
        "\"\"\"\n",
        "\n",
        "\n",
        "big5_to_detail['agreeableness'] = \"\"\"Agreeableness assesses the degree to which an individual is likable, while Agreeableness examines an individual's attitudes toward others, encompassing both a compassionate, sympathetic orientation along with antagonism, distrust, indifference. This facet represents the broad interpersonal orientation. Agreeableness represents \"love\", how much value is placed on cooperation and social harmony.\n",
        "\n",
        "High scorers are good-natured, friendly, generous, helpful, willing to compromise their interests for others. They have an optimistic view of human nature, believing others to be honest, decent and trustworthy. Low scorers place self-interest above helping others. They are generally unconcerned with others' well-being and therefore unwilling to extend themselves for others. At times, they are overly suspicious of others' motives. For certain occupations, high agreeableness may not be optimal, especially when toughness and objective judgment is required, such as scientists, critics, and soldiers.\n",
        "\n",
        "Agreeableness can be divided into six facets:\n",
        "\n",
        "A1 TRUST\n",
        "\n",
        "High scorers believe others are honest and well-intentioned. Low scorers tend to be cynical and suspicious, assuming others to be dishonest and dangerous.\n",
        "\n",
        "High scorers: Believe others to be honest, reliable and well-meaning. Forgiving, trusting of others, good-natured.\n",
        "\n",
        "Low scorers: View others as selfish, dangerous, looking to take advantage. Cautious, pessimistic, suspicious, hardhearted.\n",
        "\n",
        "A2 STRAIGHTFORWARDNESS\n",
        "\n",
        "High scorers are frank, sincere, ingenuous. Low scorers are more willing to manipulate others through flattery, craftiness, deception. They see it as an essential social skill and view straightforward people as naive.\n",
        "\n",
        "High scorers: Believe there is no need for pretense in dealing with others, appear straightforward, genuine. Direct, frank, open, ingenuous.\n",
        "\n",
        "Low scorers: Tend to be guarded in dealings with others, defensive, unwilling to reveal their full hand. Shrewd, slick, charming.\n",
        "\n",
        "A3 ALTRUISM\n",
        "\n",
        "High scorers actively concern themselves with others' welfare as evidenced by generosity, consideration of others, and a willingness to assist those in need. Low scorers are more self-centered and reluctant to get involved in others' problems.\n",
        "\n",
        "High scorers: Willing to assist others, find helping others rewarding. Warm-hearted, soft-hearted, mild-mannered, generous, kindhearted.\n",
        "\n",
        "Low scorers: Unwilling to help others, find helping a burden. Selfish, misanthropic, hardhearted, ungenerous.\n",
        "\n",
        "A4 COMPLIANCE\n",
        "\n",
        "Relates to personality in the context of interpersonal conflict. High scorers tend to defer to others, inhibit aggression, forgive and forget. Compliant people are gentle, mild-mannered. Low scorers are aggressive, preferring competition over cooperation, and unhesitatingly express anger when necessary.\n",
        "\n",
        "High scorers: Dislike conflict with others, willing to give up their own standpoint or deny their own needs to get along. Deferent, accommodating, obliging.\n",
        "\n",
        "Low scorers: Do not mind conflict with others, willing to intimidate others to achieve their goals. Stubborn, making unreasonable demands, obstinate, hardhearted.\n",
        "\n",
        "A5 MODESTY\n",
        "\n",
        "High scorers are self-effacing, humble. Low scorers believe they are superior and may be seen by others as arrogant, egotistical.\n",
        "\n",
        "High scorers: Modest, unassuming.\n",
        "\n",
        "Low scorers: Assertive, arrogant, vain, rude.\n",
        "\n",
        "A6 TENDER-MINDEDNESS\n",
        "\n",
        "Measures attitudes of sympathy and concern for others. High scorers are moved by others' needs and advocate humane social policies. Low scorers are hardheaded, unmoved by appeals to pity. They pride themselves on making objective appraisals based on cool logic.\n",
        "\n",
        "High scorers: Sympathetic, moved by others' suffering, express pity. Friendly, warm-hearted, gentle, soft-hearted.\n",
        "\n",
        "Low scorers: Do not strongly feel others' pain, pride themselves on objectivity, more concerned with truth and fairness than mercy. Callous, hardhearted, opinionated, ungenerous.\n",
        "\"\"\"\n",
        "\n",
        "big5_to_detail['extraversion'] = \"\"\"Extraversion represents the quantity and intensity of interpersonal interaction, the need for stimulation, and capacity for joy. This dimension contrasts social, outgoing, action-oriented individuals with reserved, sober, shy, silent types. This trait can be measured through two facets: the level of interpersonal involvement and the activity level. The former evaluates the degree to which an individual enjoys the company of others. The latter reflects an individual's personal pace and vigor.\n",
        "\n",
        "Extraverted people enjoy interacting with others, are full of energy, and often experience positive emotions. They are enthusiastic, enjoy physical activities, and like excitement and adventure. In a group, they are very talkative, confident, and enjoy being the center of attention.\n",
        "\n",
        "Introverted people are quieter, more cautious, and do not enjoy too much interaction with the outside world. Their lack of desire for interaction should not be confused with shyness or depression, it is simply because compared to extraverts, they do not need as much stimulation and prefer being alone. An introvert's tendencies are sometimes wrongly viewed as arrogance or unfriendliness, but they are often very kind people once you get to know them.\n",
        "\n",
        "Extraversion can be divided into the following six facets:\n",
        "\n",
        "E1 WARMTH\n",
        "\n",
        "Most relevant to interpersonal intimacy. Warm people are affectionate and friendly. They genuinely like others and easily form close relationships. Low scorers are not necessarily hostile or lacking in compassion, but are more formal, reserved, and detached in their behavior.\n",
        "\n",
        "High scorers: Warm people enjoy those around them and often express positive, friendly emotions towards others. They are good at making friends and forming intimate relationships. Sociable, talkative, affectionate.\n",
        "\n",
        "Low scorers: Although not necessarily cold or unfriendly, they are often seen as distant by others.\n",
        "\n",
        "E2 GREGARIOUSNESS\n",
        "\n",
        "Refers to a preference for other people's company. Gregarious people enjoy the company of others and the more people the merrier. Low scorers tend to be loners, they do not seek out and even actively avoid social stimulation.\n",
        "\n",
        "High scorers: Enjoy being with people, prefer lively, crowded settings. Outgoing, having many friends, seek social affiliations.\n",
        "\n",
        "Low scorers: Avoid crowds, find them draining. Prefer having more time alone, having their own personal space. Avoid crowds, enjoy solitude.\n",
        "\n",
        "E3 ASSERTIVENESS\n",
        "\n",
        "High scorers are dominant, forceful, and socially ascendant. They speak without hesitation and often become group leaders. Low scorers prefer to stay quietly in the background and let others do the talking.\n",
        "\n",
        "High scorers: Like occupying a position of social dominance, directing others, influencing others' behavior. Dominant, forceful, confident, decisive.\n",
        "\n",
        "Low scorers: Talk little in groups, allow others to occupy the dominant role. Modest, shy, quiet, reserved.\n",
        "\n",
        "E4 ACTIVITY\n",
        "\n",
        "High scorers lead fast-paced, vigorous lives, are energetic, and have a need to keep busy. Low scorers are more leisurely and relaxed, but not necessarily lazy or sluggish.\n",
        "\n",
        "High scorers: Fast-paced, busy in work and life. Appear energetic, enjoy participating in many activities. Energetic, fast-paced, vigorous.\n",
        "\n",
        "Low scorers: Slow-paced, leisurely in work and life. Unhurried, deliberate, composed.\n",
        "\n",
        "E5 EXCITEMENT SEEKING\n",
        "\n",
        "High scorers crave excitement and stimulation, enjoy bright colors and noisy environments. Excitement seeking is akin to some aspects of sensation seeking. Low scorers have little need for thrills and prefer activities seen as dull by the former.\n",
        "\n",
        "High scorers: Easily get bored without stimulation, enjoy loud music/noise, like adventure, seek thrills. Flashy, seek intense experiences, thrill-seeking.\n",
        "\n",
        "Low scorers: Avoid noise and crowds, dislike risky ventures. Cautious, sedate, not interested in thrills.\n",
        "\n",
        "E6 POSITIVE EMOTIONS\n",
        "\n",
        "Reflects the tendency to experience positive emotions (joy, happiness, love, excitement).\n",
        "\n",
        "High scorers: Easily experience various positive moods like joy, optimism, excitement. Cheerful, elated, optimistic.\n",
        "\n",
        "Low scorers: Do not readily experience positive emotions, but does not mean they often feel negative emotions either. Low scorers are simply less prone to excitement. Unemotional, calm, serious.\n",
        "\"\"\"\n",
        "\n",
        "big5_to_detail['neuroticism'] = \"\"\"Neuroticism or Emotional Stability: Having tendencies of anxiety, hostility, depression, self-consciousness, impulsiveness, vulnerability.\n",
        "\n",
        "N1 ANXIETY\n",
        "\n",
        "Anxious individuals tend to worry, fear, be easily concerned, tense, and oversensitive. Those who score high are more likely to have free-floating anxiety and apprehension. Those with low scores tend to be calm, relaxed. They do not constantly worry about things that might go wrong.\n",
        "\n",
        "High scorers: Anxiety, easily feel danger and threats, tend to be tense, fearful, worried, uneasy.\n",
        "\n",
        "Low scorers: Calm state of mind, relaxed, not easily scared, won't always worry about things that could go wrong, emotions are calm, relaxed, stable.\n",
        "\n",
        "N2 ANGRY HOSTILITY\n",
        "\n",
        "Reflects the tendency to experience anger and related states (e.g. frustration, bitterness). Measures the ease with which an individual experiences anger.\n",
        "\n",
        "High scorers: Easily angered, resentful when feeling unfairly treated, irritable, angry, frustrated.\n",
        "\n",
        "Low scorers: Not easily angered/provoked, friendly, even-tempered, not easily provoked to anger.\n",
        "\n",
        "N3 DEPRESSION\n",
        "\n",
        "Measures individual differences in the tendency to experience depressive affect. High scorers are prone to feelings of guilt, sadness, hopelessness, and loneliness. They are easily discouraged and often feel blue. Low scorers rarely experience these emotions.\n",
        "\n",
        "High scorers: Despairing, guilty, gloomy, dejected. Prone to feeling sorrow, abandonment, discouraged. Prone to feelings of guilt, sadness, disappointment, and loneliness. Easily discouraged, often feeling down.\n",
        "\n",
        "Low scorers: Not prone to feeling sad, rarely feels abandoned.\n",
        "\n",
        "N4 SELF-CONSCIOUSNESS\n",
        "\n",
        "Core is shyness and embarrassability. Such individuals feel uncomfortable in groups, are sensitive to ridicule, and prone to feelings of inferiority. Self-consciousness is similar to shyness and social anxiety. Low scorers are not necessarily well-mannered or skilled socially, they are simply less disrupted by awkward social situations.\n",
        "\n",
        "High scorers: Too concerned with what others think, afraid of being laughed at, tend to feel shy, anxious, inferior, awkward in social situations.\n",
        "\n",
        "Low scorers: Composed, confident in social situations, not easily made tense or shy.\n",
        "\n",
        "N5 IMPULSIVENESS\n",
        "\n",
        "Refers to control over cravings and urges. Individuals give in too easily to impulses and temptations to care for long-term consequences (e.g. for food, cigarettes, possessions), although they will regret their actions later. Low scorers are better able to resist temptation and have higher frustration tolerance.\n",
        "\n",
        "High scorers: Cannot resist cravings when experiencing strong urges, tend to pursue short-term satisfaction without considering long-term consequences. Cannot resist temptations, rash, spiteful, self-centered.\n",
        "\n",
        "Low scorers: Self-controlled, can resist temptation.\n",
        "\n",
        "N6 VULNERABILITY\n",
        "\n",
        "Refers to susceptibility to stress. High scorers have difficulty coping with stress, become dependent, hopeless, panicked when facing emergencies. Low scorers feel that they can handle difficult situations properly.\n",
        "\n",
        "High scorers: Under stress, easily feel panic, confusion, helpless, cannot cope with stress.\n",
        "\n",
        "Low scorers: Under stress, feel calm, confident. Resilient, clear-headed, brave.\"\"\"\n"
      ],
      "metadata": {
        "cellView": "form",
        "id": "VxS2YYWJaUDe"
      },
      "execution_count": 10,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "prefix_prompt = \"你扮演一个资深的心理学家。\"\n",
        "\n",
        "\n",
        "\n",
        "suffix_prompt = \"\"\"我已经设计了一些访谈的问题，请为我评估这些访谈问题中的每一个，在访谈中是否能够正确测试出被试对应人格特质的程度\n",
        "以及假设一些被试的回答，在reason中用\"如果被试回答xxx，那就意味着xxx\"的语气进行评估。输出成jsonl格式，每个问题的评估包含\n",
        "\"problem\",\"if_plausible\",\"related_sub_factor\",\"reason\"四个字段。\n",
        "- if_plausible输出True或者False\n",
        "- related_sub_factor使用O1, C2这样的字母表示\n",
        "输出为\n",
        "{\n",
        "  \"problem\": <repeat the problem I designed>\n",
        "  \"if_plausible\": <true or false>,\n",
        "  \"related_sub_factor\": <related_sub_factor,指出problem对应能够测试被试的哪个子维度的人格特质>,\n",
        "  \"reason\": <summarize the reason>\n",
        "}\n",
        "这样的格式:\n",
        "\n",
        "\"\"\"\n",
        "\n"
      ],
      "metadata": {
        "id": "rJQ1qS9ia3kG"
      },
      "execution_count": 143,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "\n",
        "key = \"sk-OSqRHs2\"\n",
        "key_bytes = key.encode()\n",
        "os.environ[\"OPENAI_API_KEY\"] = key_bytes.decode('utf-8')"
      ],
      "metadata": {
        "id": "1pBDDIKYdNZz"
      },
      "execution_count": 144,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.chat_models import ChatOpenAI\n",
        "from langchain.prompts.chat import (\n",
        "    ChatPromptTemplate,\n",
        "    SystemMessagePromptTemplate,\n",
        "    AIMessagePromptTemplate,\n",
        "    HumanMessagePromptTemplate,\n",
        ")\n",
        "from langchain.schema import (\n",
        "    AIMessage,\n",
        "    HumanMessage,\n",
        "    SystemMessage\n",
        ")"
      ],
      "metadata": {
        "id": "09jhEovndHiW"
      },
      "execution_count": 145,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "chat = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature = 0.0)"
      ],
      "metadata": {
        "id": "Fm2DdNKHdMNr"
      },
      "execution_count": 146,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "smart_system_prompt = \"\"\"You are ChatGPT, a large language model trained by OpenAI.\n",
        "Knowledge cutoff: 2021-09\n",
        "Current date: 2023-03-15\"\"\""
      ],
      "metadata": {
        "id": "scUn2bX0dgj4"
      },
      "execution_count": 147,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import tiktoken\n",
        "\n",
        "enc = tiktoken.get_encoding(\"cl100k_base\")"
      ],
      "metadata": {
        "id": "vvpIKIYud9KO"
      },
      "execution_count": 148,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "whole_message = prefix_prompt\n",
        "\n",
        "problems = [x for x in datas if x['factor'] == current_factor]\n",
        "\n",
        "current_factor = long_name['O']\n",
        "\n",
        "target_message = f\"我正在设计一个心理学的实验，我希望通过访谈，去评估被试在大五人格中 {current_factor}的程度\"\n",
        "\n",
        "whole_message += '\\n'\n",
        "whole_message += target_message\n",
        "whole_message += '\\n'\n",
        "\n",
        "whole_message += big5_to_detail[current_factor]\n",
        "\n",
        "whole_message += suffix_prompt\n",
        "\n",
        "whole_message += \"我设计的问题如下:\\n\"\n",
        "\n",
        "for prob_id in range(4):\n",
        "    whole_message += '- ' + problems[prob_id]['question']\n",
        "    whole_message += '\\n'\n",
        "\n",
        "\n",
        "messages = [\n",
        "    SystemMessage(content=smart_system_prompt),\n",
        "    HumanMessage(content=whole_message),\n",
        "]"
      ],
      "metadata": {
        "id": "0tSW-LAld_Lr"
      },
      "execution_count": 149,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "response = chat(messages)\n",
        "print(response.content)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CQL854nifhRD",
        "outputId": "889de47a-6cbf-42c4-d4ad-3e9a50ae2f03"
      },
      "execution_count": 150,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{\n",
            "  \"problem\": \"我不喜欢浪费时间去做白日梦。\",\n",
            "  \"if_plausible\": false,\n",
            "  \"related_sub_factor\": \"O1\",\n",
            "  \"reason\": \"如果被试回答不喜欢浪费时间去做白日梦，那就意味着他们在O1（FANTASY）子维度上得分较低。高分者通常有丰富的想象力和活跃的幻想生活，而低分者则更注重现实和实际思维。\"\n",
            "}\n",
            "\n",
            "{\n",
            "  \"problem\": \"大自然和艺术的规律形态使我感到极为奥妙。\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"O2\",\n",
            "  \"reason\": \"如果被试回答大自然和艺术的规律形态使我感到极为奥妙，那就意味着他们在O2（AESTHETICS）子维度上得分较高。高分者通常对艺术和美有深刻的欣赏，被艺术和美所感动。低分者则相对不敏感和漠不关心艺术和美。\"\n",
            "}\n",
            "\n",
            "{\n",
            "  \"problem\": \"我对诗词只有少许感觉甚至无动於衷。\",\n",
            "  \"if_plausible\": false,\n",
            "  \"related_sub_factor\": \"O2\",\n",
            "  \"reason\": \"如果被试回答对诗词只有少许感觉甚至无动於衷，那就意味着他们在O2（AESTHETICS）子维度上得分较低。高分者通常对艺术和美有深刻的欣赏，被艺术和美所感动。低分者则相对不敏感和漠不关心艺术和美。\"\n",
            "}\n",
            "\n",
            "{\n",
            "  \"problem\": \"当我阅读一首诗或欣赏一件艺术品时，我有时会感到兴奋或惊喜。\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"O2\",\n",
            "  \"reason\": \"如果被试回答当我阅读一首诗或欣赏一件艺术品时，我有时会感到兴奋或惊喜，那就意味着他们在O2（AESTHETICS）子维度上得分较高。高分者通常对艺术和美有深刻的欣赏，被艺术和美所感动。低分者则相对不敏感和漠不关心艺术和美。\"\n",
            "}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "我现在有一个字符串\n",
        "\n",
        "response = \"\"\"\n",
        "\n",
        "noise\n",
        "\n",
        "{\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"O1\",\n",
        "  \"reason\": \"If the participant agrees with the statement and expresses a dislike for daydreaming, it suggests a lower level of openness to fantasy (O1).\"\n",
        "}\n",
        "\n",
        "{\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"O2\",\n",
        "  \"reason\": \"If the participant agrees with the statement and expresses a deep appreciation for the beauty and mystery of nature and art, it suggests a higher level of openness to aesthetics (O2).\"\n",
        "}\n",
        "\n",
        "noise string\n",
        "\"\"\"\n",
        "\n",
        "我想提取其中的json格式的信息，请为我实现一个python函数，输入为这样的字符串，输出为list of json"
      ],
      "metadata": {
        "id": "R4iSfuUWhaU1"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import json\n",
        "import re\n",
        "\n",
        "def extract_json(text):\n",
        "    json_strings = re.findall(r'{.*?}', text, re.DOTALL)\n",
        "    json_data = []\n",
        "    for json_str in json_strings:\n",
        "        json_data.append(json.loads(json_str))\n",
        "\n",
        "    if len(json_data) != 0:\n",
        "        return json_data\n",
        "\n",
        "    # pattern = r\"\\n- if_plausible:\\s*(.*?)\\s*- related_sub_factor:\\s*(.*?)\\s*- reason:\\s*(.*?)\\s*\"\n",
        "    # matches = re.findall(pattern, response, re.DOTALL)\n",
        "\n",
        "    # json_data = []\n",
        "    # for mm in matches:\n",
        "    #     plausible, sub_factor, reason = mm\n",
        "    #     question_info = {\n",
        "    #         \"if_plausible\": plausible,\n",
        "    #         \"related_sub_factor\": sub_factor,\n",
        "    #         \"reason\": reason\n",
        "    #     }\n",
        "    #     json_data.append(question_id)\n",
        "\n",
        "    return json_data\n",
        "\n",
        "# print(extract_json(response.content))"
      ],
      "metadata": {
        "id": "GNgaMIzZeJXq"
      },
      "execution_count": 151,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "我希望从类似以下的文本中\n",
        "\n",
        "批量提取信息成为jsonl，分别提取if_plausible, related_sub_factor和reason三个字段\n",
        "\n",
        "response = \"\"\"\n",
        "作为资深心理学家，我将为您评估您设计的访谈问题，以确定它们是否能够正确测试被试的神经质程度。以下是每个问题的评估结果：\n",
        "\n",
        "问题1: \"我不是一个充满烦恼的人。\"\n",
        "- if_plausible: True\n",
        "- related_sub_factor: N1\n",
        "- reason: 如果被试回答\"True\"，那就意味着他们在神经质因素N1（焦虑）方面得分较低。他们可能更倾向于保持冷静和放松，不经常为可能出错的事情担心。\n",
        "\n",
        "问题2: \"我很少感到恐惧及焦虑。\"\n",
        "- if_plausible: True\n",
        "- related_sub_factor: N1\n",
        "- reason: 如果被试回答\"True\"，那就意味着他们在神经质因素N1（焦虑）方面得分较低。他们可能不容易感到恐惧、担心或焦虑，情绪相对较为稳定。\n",
        "\"\"\"\n",
        "\n",
        "请为我用python实现\n"
      ],
      "metadata": {
        "id": "xzd598JVsT8Y"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# prob_in_chunk = []\n",
        "# response_in_chunk = []"
      ],
      "metadata": {
        "id": "9eAxHy7nlxBz"
      },
      "execution_count": 152,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from tqdm import tqdm\n",
        "\n",
        "count = 0\n",
        "\n",
        "for short_name in long_name.keys():\n",
        "\n",
        "    current_factor = long_name[short_name]\n",
        "\n",
        "    problems = [x for x in datas if x['factor'] == current_factor]\n",
        "\n",
        "    for i in tqdm(range(0, len(problems), 4)):\n",
        "        selected_probs = problems[i:i+4]\n",
        "\n",
        "        whole_message = prefix_prompt\n",
        "        target_message = f\"我正在设计一个心理学的实验，我希望通过访谈，去评估被试在大五人格中 {current_factor}的程度\"\n",
        "\n",
        "        whole_message += '\\n'\n",
        "        whole_message += target_message\n",
        "        whole_message += '\\n'\n",
        "\n",
        "        whole_message += big5_to_detail[current_factor]\n",
        "\n",
        "        whole_message += suffix_prompt\n",
        "\n",
        "        whole_message += \"我设计的问题如下:\\n\"\n",
        "\n",
        "        #TODO : 改为从problems中4个一组抽取\n",
        "        for prob in selected_probs:\n",
        "            whole_message += '- ' + prob['question']\n",
        "            whole_message += '\\n'\n",
        "\n",
        "        count = count + 1\n",
        "\n",
        "        if count <= len(prob_in_chunk):\n",
        "            continue\n",
        "\n",
        "        prob_in_chunk.append(selected_probs)\n",
        "\n",
        "\n",
        "        messages = [\n",
        "            SystemMessage(content=smart_system_prompt),\n",
        "            HumanMessage(content=whole_message),\n",
        "        ]\n",
        "\n",
        "        response = chat(messages)\n",
        "        response_in_chunk.append(response.content)\n",
        "\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "GIiuvo32hTOI",
        "outputId": "67374c14-97f3-4e9d-cf54-a0d365156674"
      },
      "execution_count": 153,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|██████████| 6/6 [02:20<00:00, 23.35s/it]\n",
            "100%|██████████| 5/5 [02:42<00:00, 32.49s/it]\n",
            "100%|██████████| 6/6 [02:46<00:00, 27.80s/it]\n",
            "100%|██████████| 6/6 [03:33<00:00, 35.64s/it]\n",
            "100%|██████████| 6/6 [03:07<00:00, 31.27s/it]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# prob_in_chunk = []\n",
        "# response_in_chunk = []\n",
        "\n",
        "print(len(prob_in_chunk))\n",
        "print(len(response_in_chunk))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0LguTONI8DAz",
        "outputId": "a1fdda4c-0510-4b1c-c72b-f4f198e96cbc"
      },
      "execution_count": 178,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "29\n",
            "28\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(prob_in_chunk[-28])\n",
        "print(response_in_chunk[-28])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "wQ-hWY_R8L4y",
        "outputId": "cf1288bf-10fe-43dd-e4f9-9f1f1827bb28"
      },
      "execution_count": 185,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[{'id': 20, 'question': '我很少感到寂寞或忧郁。', 'factor': 'neuroticism', 'if_pos': 'false'}, {'id': 25, 'question': '有时我感到自已完全一文不值。', 'factor': 'neuroticism', 'if_pos': 'true'}, {'id': 30, 'question': '我很少感到忧郁或沮丧。', 'factor': 'neuroticism', 'if_pos': 'false'}, {'id': 35, 'question': '很多时候，当事情不对劲时，我会感到挫败及想放弃。', 'factor': 'neuroticism', 'if_pos': 'true'}]\n",
            "{\n",
            "  \"problem\": \"我很少感到寂寞或忧郁。\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"N3\",\n",
            "  \"reason\": \"如果被试回答很少感到寂寞或忧郁，那就意味着他们在N3（Depression）这个人格特质上得分较低，不容易体验到内疚、悲伤、绝望和孤独等情绪。\"\n",
            "}\n",
            "\n",
            "{\n",
            "  \"problem\": \"有时我感到自己完全一文不值。\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"N3\",\n",
            "  \"reason\": \"如果被试回答有时感到自己完全一文不值，那就意味着他们在N3（Depression）这个人格特质上得分较高，容易体验到自责、悲伤、绝望和孤独等情绪，感觉自己没有价值。\"\n",
            "}\n",
            "\n",
            "{\n",
            "  \"problem\": \"我很少感到忧郁或沮丧。\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"N3\",\n",
            "  \"reason\": \"如果被试回答很少感到忧郁或沮丧，那就意味着他们在N3（Depression）这个人格特质上得分较低，不容易体验到悲伤、绝望和沮丧等情绪。\"\n",
            "}\n",
            "\n",
            "{\n",
            "  \"problem\": \"很多时候，当事情不对劲时，我会感到挫败及想放弃。\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"N3\",\n",
            "  \"reason\": \"如果被试回答很多时候，当事情不对劲时，会感到挫败及想放弃，那就意味着他们在N3（Depression）这个人格特质上得分较高，容易感到沮丧、失望和丧失动力。\"\n",
            "}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "count = 0\n",
        "\n",
        "# 这里不知道为啥被我整错位了\n",
        "\n",
        "# for probs,response in zip(prob_in_chunk,response_in_chunk):\n",
        "\n",
        "for i in range(len(response_in_chunk)):\n",
        "    response = response_in_chunk[i]\n",
        "    probs = prob_in_chunk[i+1]\n",
        "    if len(probs) == len(extract_json(response)):\n",
        "        print(count , ' ok', len(probs))\n",
        "    else:\n",
        "        print(count , ' not ok', len(probs), len(extract_json(response)))\n",
        "    count += 1\n",
        "        # print(response)\n",
        "    # print()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2gPtV2qsq93b",
        "outputId": "26cb737f-3287-489d-d71a-577e52839599"
      },
      "execution_count": 202,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "0  ok 4\n",
            "1  ok 4\n",
            "2  ok 4\n",
            "3  ok 4\n",
            "4  ok 1\n",
            "5  ok 4\n",
            "6  ok 4\n",
            "7  ok 4\n",
            "8  ok 4\n",
            "9  ok 4\n",
            "10  ok 4\n",
            "11  ok 4\n",
            "12  ok 4\n",
            "13  ok 4\n",
            "14  ok 4\n",
            "15  ok 1\n",
            "16  ok 4\n",
            "17  ok 4\n",
            "18  ok 4\n",
            "19  ok 4\n",
            "20  ok 4\n",
            "21  ok 4\n",
            "22  ok 4\n",
            "23  ok 4\n",
            "24  ok 4\n",
            "25  ok 4\n",
            "26  ok 4\n",
            "27  ok 4\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "count = 0\n",
        "\n",
        "for i in range(len(response_in_chunk)):\n",
        "    response = response_in_chunk[i]\n",
        "    probs = prob_in_chunk[i+1]\n",
        "    if len(probs) == len(extract_json(response)):\n",
        "        # print(count , ' ok', len(probs))\n",
        "        count += len(probs)\n",
        "    else:\n",
        "        pass\n",
        "\n",
        "print(count)\n",
        "print(len(datas))\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0bd29HOV_CA9",
        "outputId": "e824a167-7634-4e25-e728-6b3c7e7a4758"
      },
      "execution_count": 204,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "106\n",
            "110\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import json\n",
        "\n",
        "save_name = '/content/questions_verified.jsonl'\n",
        "\n",
        "with open(save_name, 'w', encoding='utf-8') as f:\n",
        "    for i in range(len(response_in_chunk)):\n",
        "        response = response_in_chunk[i]\n",
        "        probs = prob_in_chunk[i+1]\n",
        "        if len(probs) == len(extract_json(response)):\n",
        "            for prob, res in zip(probs,extract_json(response)):\n",
        "                # print(prob)\n",
        "                # print(res)\n",
        "\n",
        "                data = {\n",
        "                    'id': prob['id'],\n",
        "                    'question': res['problem'],\n",
        "                    'factor': prob['factor'],\n",
        "                    'sub_factor': res['related_sub_factor'],\n",
        "                    'if_pos':prob['if_pos'],\n",
        "                    'if_plausible':res['if_plausible'],\n",
        "                    'reason': res['reason'],\n",
        "                    'prob_repeat':prob['question']\n",
        "                }\n",
        "\n",
        "                # datas.append(data)\n",
        "\n",
        "                json.dump(data, f, ensure_ascii=False)\n",
        "                f.write('\\n')\n",
        "        else:\n",
        "            print(' not ok', len(probs), len(extract_json(response)))\n",
        "        # break\n"
      ],
      "metadata": {
        "id": "6SGRzSak_c00"
      },
      "execution_count": 211,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "sel_id = 4\n",
        "print(prob_in_chunk[sel_id+1])\n",
        "print(response_in_chunk[sel_id])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2C7pRdOF7qfI",
        "outputId": "d51675fc-8548-41c6-a1d0-bde55809cb0b"
      },
      "execution_count": 187,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[{'id': 77, 'question': '我也会有自卑的感觉。你是否也会感到自己不如别人?这种比较会让你有什么样的感受?', 'factor': 'neuroticism', 'if_pos': 'synthesis'}]\n",
            "\n",
            "评估结果:\n",
            "{\n",
            "  \"problem\": \"你平时是如何制定和实现目标的呢?有没有什么特别的方法可以提高目标的完成度呢?\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"C4\",\n",
            "  \"reason\": \"这个问题涉及到被试在目标设定和实现方面的能力。如果被试提到他们制定明确的目标，并且有一套系统的方法来提高目标的完成度，那么可以推断被试在conscientiousness的facet C4（Achievement Striving）上得分较高。\"\n",
            "}\n",
            "\n",
            "问题2:\n",
            "- 你对追求卓越有什么独特的见解吗?怎样才能在生活中更进一步呢?\n",
            "\n",
            "评估结果:\n",
            "{\n",
            "  \"problem\": \"你对追求卓越有什么独特的见解吗?怎样才能在生活中更进一步呢?\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"C1\",\n",
            "  \"reason\": \"这个问题涉及到被试对追求卓越的态度和方法。如果被试表达了对追求卓越的重视，并提供了一些具体的方法或见解来在生活中取得更进一步，那么可以推断被试在conscientiousness的facet C1（Competence）上得分较高。\"\n",
            "}\n",
            "\n",
            "问题3:\n",
            "- 我有时挺拖延的。你平时如何规划时间、提高执行效率的呢?有什么技巧可以帮助自己更准时完成任务吗?\n",
            "\n",
            "评估结果:\n",
            "{\n",
            "  \"problem\": \"我有时挺拖延的。你平时如何规划时间、提高执行效率的呢?有什么技巧可以帮助自己更准时完成任务吗?\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"C5\",\n",
            "  \"reason\": \"这个问题涉及到被试在时间管理和执行效率方面的能力。如果被试提供了一些规划时间和提高执行效率的技巧，并强调准时完成任务的重要性，那么可以推断被试在conscientiousness的facet C5（Self-Discipline）上得分较高。\"\n",
            "}\n",
            "\n",
            "问题4:\n",
            "- 我开始工作时也常需要先进入状态。你是如何快速投入工作的呢?有什么诀窍可以帮助自己更高效地开始工作吗?\n",
            "\n",
            "评估结果:\n",
            "{\n",
            "  \"problem\": \"我开始工作时也常需要先进入状态。你是如何快速投入工作的呢?有什么诀窍可以帮助自己更高效地开始工作吗?\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"C6\",\n",
            "  \"reason\": \"这个问题涉及到被试在开始工作时快速进入状态的能力。如果被试提供了一些方法或技巧来帮助自己更高效地开始工作，并强调思考后行动的重要性，那么可以推断被试在conscientiousness的facet C6（Deliberation）上得分较高。\"\n",
            "}\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "\n",
        "suffix_prompt = \"\"\"我已经设计了一些访谈的问题，请为我评估这些访谈问题中的每一个，在访谈中是否能够正确测试出被试对应人格特质的程度\n",
        "以及假设一些被试的回答，在reason中用\"如果被试回答xxx，那就意味着xxx\"的语气进行评估。输出成jsonl格式，每个问题的评估包含\n",
        "\"problem\",\"if_plausible\",\"related_sub_factor\",\"reason\"四个字段。\n",
        "- if_plausible输出True或者False\n",
        "- related_sub_factor 指出problem对应能够测试被试的哪个子维度的人格特质\n",
        "\"\"\"\n"
      ],
      "metadata": {
        "id": "HSkLclek6WRe"
      },
      "execution_count": 188,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "sel_id = 4\n",
        "print(prob_in_chunk[sel_id+1])\n",
        "print(response_in_chunk[sel_id])"
      ],
      "metadata": {
        "id": "ayn8lIH96nX7"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "sel_id = 4\n",
        "# print(prob_in_chunk[sel_id+1])\n",
        "# print(response_in_chunk[sel_id])\n",
        "\n",
        "whole_message = prefix_prompt\n",
        "\n",
        "# problems = [x for x in datas if x['factor'] == current_factor]\n",
        "\n",
        "current_factor = prob_in_chunk[sel_id+1][0]['factor']\n",
        "\n",
        "target_message = f\"我正在设计一个心理学的实验，我希望通过访谈，去评估被试在大五人格中 {current_factor}的程度\"\n",
        "\n",
        "whole_message += '\\n'\n",
        "whole_message += target_message\n",
        "whole_message += '\\n'\n",
        "\n",
        "whole_message += big5_to_detail[current_factor]\n",
        "\n",
        "whole_message += suffix_prompt\n",
        "\n",
        "whole_message += \"我设计的问题如下:\\n\"\n",
        "\n",
        "for prob in prob_in_chunk[sel_id+1]:\n",
        "    whole_message += '- ' + prob['question']\n",
        "    whole_message += '\\n'\n",
        "\n",
        "# print(whole_message)\n",
        "messages = [\n",
        "    SystemMessage(content=smart_system_prompt),\n",
        "    HumanMessage(content=whole_message),\n",
        "]\n",
        "\n",
        "response = chat(messages)\n",
        "print(response.content)\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "r3S_7VsnwSWn",
        "outputId": "ce0ce01c-2ea7-4c48-a27d-ec0c8c076d5d"
      },
      "execution_count": 194,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{\n",
            "  \"problem\": \"我也会有自卑的感觉。你是否也会感到自己不如别人?这种比较会让你有什么样的感受?\",\n",
            "  \"if_plausible\": true,\n",
            "  \"related_sub_factor\": \"N4 SELF-CONSCIOUSNESS\",\n",
            "  \"reason\": \"如果被试回答是肯定的，表示被试在自我意识方面得分较高，对自己与他人的比较敏感，容易感到自卑和不如别人。\"\n",
            "}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "len(extract_json(response.content))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "8i1JnJzT68dE",
        "outputId": "8a60ed96-10f3-44ca-9053-f0b2185c8c5c"
      },
      "execution_count": 192,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0"
            ]
          },
          "metadata": {},
          "execution_count": 192
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(response.content)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "WZ5kdhME9ZDb",
        "outputId": "aff0ec01-b44a-4a70-9255-e091e5f79593"
      },
      "execution_count": 193,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "根据你提供的问题，我将评估每个问题在访谈中是否能够正确测试出被试在大五人格特质中conscientiousness的程度。以下是对每个问题的评估结果：\n",
            "\n",
            "问题1:\n",
            "- 你平时是如何制定和实现目标的呢?有没有什么特别的方法可以提高目标的完成度呢?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C4 (Achievement Striving)\n",
            "- reason: 如果被试回答他们制定明确的目标并采取积极的行动来实现这些目标，那就意味着他们在C4（追求成就）方面得分较高。这表明他们有较高的抱负水平，努力工作以实现自己的目标。\n",
            "\n",
            "问题2:\n",
            "- 你对追求卓越有什么独特的见解吗?怎样才能在生活中更进一步呢?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C4 (Achievement Striving)\n",
            "- reason: 如果被试回答他们对追求卓越有强烈的渴望，并提出了具体的方法和策略来在生活中取得更大的进步，那就意味着他们在C4（追求成就）方面得分较高。这表明他们有较高的抱负水平，努力追求卓越。\n",
            "\n",
            "问题3:\n",
            "- 我有时挺拖延的。你平时如何规划时间、提高执行效率的呢?有什么技巧可以帮助自己更准时完成任务吗?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C5 (Self-Discipline)\n",
            "- reason: 如果被试回答他们采取了明确的时间规划和执行效率提高的方法，并提供了一些技巧来帮助自己更好地管理时间和准时完成任务，那就意味着他们在C5（自律）方面得分较高。这表明他们有较强的自我控制能力和执行力。\n",
            "\n",
            "问题4:\n",
            "- 我开始工作时也常需要先进入状态。你是如何快速投入工作的呢?有什么诀窍可以帮助自己更高效地开始工作吗?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C6 (Deliberation)\n",
            "- reason: 如果被试回答他们有一些方法和技巧可以帮助自己快速进入工作状态，并提到了一些策略来提高工作效率，那就意味着他们在C6（审慎性）方面得分较高。这表明他们在行动之前会仔细考虑，并能够迅速适应工作环境。\n",
            "\n",
            "以上是对你设计的问题在评估被试conscientiousness程度方面的分析和评估结果。请注意，这些评估结果仅基于被试的回答假设，实际的评估需要综合考虑多个因素。\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "response_in_chunk[4] = \"\"\"\n",
        "{\n",
        "  \"problem\": \"我也会有自卑的感觉。你是否也会感到自己不如别人?这种比较会让你有什么样的感受?\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"N4 SELF-CONSCIOUSNESS\",\n",
        "  \"reason\": \"如果被试回答是肯定的，表示被试在自我意识方面得分较高，对自己与他人的比较敏感，容易感到自卑和不如别人。\"\n",
        "}\n",
        "\"\"\""
      ],
      "metadata": {
        "id": "_EW8rEISmNdF"
      },
      "execution_count": 195,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "sel_id = 8\n",
        "\n",
        "print(response_in_chunk[sel_id])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "PZ1mPno_9u7q",
        "outputId": "8370f43f-3deb-4bea-ba3f-7e53463a8c09"
      },
      "execution_count": 196,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "根据你提供的问题，我将评估每个问题在访谈中是否能够正确测试出被试对应人格特质的程度，并提供假设一些被试的回答的评估。以下是评估结果：\n",
            "\n",
            "1. 问题: 我最近参加了一个聚会，认识了很多新朋友，感觉交谈的时候彼此距离拉近了很多。你平时喜欢参加聚会认识新朋友吗？\n",
            "   - if_plausible: True\n",
            "   - related_sub_factor: E2 Gregariousness\n",
            "   - reason: 这个问题可以测试被试在E2 Gregariousness子维度中的程度。如果被试回答喜欢参加聚会认识新朋友，那就意味着他们在社交方面更加外向，喜欢与他人交往，享受与人相处的经历。\n",
            "\n",
            "2. 问题: 我个人更喜欢一个人安静工作，可以让我更专注。你在工作的时候喜欢跟其他人一起合作，还是更习惯自己一个人独立完成呢？\n",
            "   - if_plausible: True\n",
            "   - related_sub_factor: E3 Assertiveness\n",
            "   - reason: 这个问题可以测试被试在E3 Assertiveness子维度中的程度。如果被试回答更喜欢一个人安静工作，更习惯自己独立完成任务，那就意味着他们在社交方面相对较内向，更倾向于保持低调，不太喜欢在工作中与他人合作。\n",
            "\n",
            "3. 问题: 我每天都保持很高的工作效率，生活也比较忙碌。你平时的生活节奏是较快还是较慢的呢？喜欢积极参与各种活动吗？\n",
            "   - if_plausible: True\n",
            "   - related_sub_factor: E4 Activity\n",
            "   - reason: 这个问题可以测试被试在E4 Activity子维度中的程度。如果被试回答生活节奏较快，喜欢积极参与各种活动，那就意味着他们在生活中更加快节奏，喜欢保持忙碌和参与各种活动。\n",
            "\n",
            "4. 问题: 你认为自己是一个乐观开朗的人吗？生活中容易感受到快乐吗？\n",
            "   - if_plausible: True\n",
            "   - related_sub_factor: E6 Positive Emotions\n",
            "   - reason: 这个问题可以测试被试在E6 Positive Emotions子维度中的程度。如果被试回答认为自己是一个乐观开朗的人，生活中容易感受到快乐，那就意味着他们更容易体验到积极的情绪，如喜悦、乐观和兴奋。\n",
            "\n",
            "请注意，以上评估结果仅基于问题的描述和假设的回答，实际的评估结果可能需要结合更多的信息和综合判断。\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "response_in_chunk[8] = \"\"\"\n",
        "{\n",
        "  \"problem\": \"我最近参加了一个聚会，认识了很多新朋友，感觉交谈的时候彼此距离拉近了很多。你平时喜欢参加聚会认识新朋友吗？\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"E2 Gregariousness\",\n",
        "  \"reason\": \"这个问题可以测试被试在E2 Gregariousness子维度中的程度。如果被试回答喜欢参加聚会认识新朋友，那就意味着他们在社交方面更加外向，喜欢与他人交往，享受与人相处的经历。\"\n",
        "}\n",
        "{\n",
        "  \"problem\": \"我个人更喜欢一个人安静工作，可以让我更专注。你在工作的时候喜欢跟其他人一起合作，还是更习惯自己一个人独立完成呢?\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"E3 Assertiveness\",\n",
        "  \"reason\": \"这个问题可以测试被试在E3 Assertiveness子维度中的程度。如果被试回答更喜欢一个人安静工作，更习惯自己独立完成任务，那就意味着他们在社交方面相对较内向，更倾向于保持低调，不太喜欢在工作中与他人合作。\"\n",
        "}\n",
        "{\n",
        "  \"problem\": \"我每天都保持很高的工作效率，生活也比较忙碌。你平时的生活节奏是较快还是较慢的呢？喜欢积极参与各种活动吗?\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"E4 Activity\",\n",
        "  \"reason\": \"这个问题可以测试被试在E4 Activity子维度中的程度。如果被试回答生活节奏较快，喜欢积极参与各种活动，那就意味着他们在生活中更加快节奏，喜欢保持忙碌和参与各种活动。\"\n",
        "}\n",
        "{\n",
        "  \"problem\": \"你认为自己是一个乐观开朗的人吗？生活中容易感受到快乐吗？\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"E6 Positive Emotions\",\n",
        "  \"reason\": \"这个问题可以测试被试在E6 Positive Emotions子维度中的程度。如果被试回答认为自己是一个乐观开朗的人，生活中容易感受到快乐，那就意味着他们更容易体验到积极的情绪，如喜悦、乐观和兴奋。\"\n",
        "}\n",
        "\"\"\""
      ],
      "metadata": {
        "id": "_6F-egBX9yEf"
      },
      "execution_count": 197,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "sel_id = 27\n",
        "\n",
        "print(response_in_chunk[sel_id])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "sOTrtpWU-Xpj",
        "outputId": "13941914-e129-42ad-bb8b-99f2bad38a99"
      },
      "execution_count": 199,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "根据你提供的问题，我将评估每个问题在访谈中是否能够正确测试出被试在大五人格特质中conscientiousness的程度。以下是对每个问题的评估结果：\n",
            "\n",
            "问题1:\n",
            "- 你平时是如何制定和实现目标的呢?有没有什么特别的方法可以提高目标的完成度呢?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C4 (Achievement Striving)\n",
            "- reason: 如果被试回答他们制定明确的目标并采取积极的行动来实现这些目标，那就意味着他们在C4（追求成就）方面得分较高。这表明他们有较高的抱负水平，并且努力工作以实现自己的目标。\n",
            "\n",
            "问题2:\n",
            "- 你对追求卓越有什么独特的见解吗?怎样才能在生活中更进一步呢?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C4 (Achievement Striving)\n",
            "- reason: 如果被试回答他们对追求卓越有强烈的渴望，并提出了具体的方法和策略来在生活中取得更大的进步，那就意味着他们在C4（追求成就）方面得分较高。这表明他们有较高的抱负水平，并且致力于不断提高自己。\n",
            "\n",
            "问题3:\n",
            "- 我有时挺拖延的。你平时如何规划时间、提高执行效率的呢?有什么技巧可以帮助自己更准时完成任务吗?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C5 (Self-Discipline)\n",
            "- reason: 如果被试回答他们采取了明确的时间规划和执行效率提高的方法，并提供了一些技巧来帮助自己更好地管理时间和准时完成任务，那就意味着他们在C5（自律性）方面得分较高。这表明他们有较强的自我约束能力和执行力。\n",
            "\n",
            "问题4:\n",
            "- 我开始工作时也常需要先进入状态。你是如何快速投入工作的呢?有什么诀窍可以帮助自己更高效地开始工作吗?\n",
            "\n",
            "评估结果:\n",
            "- if_plausible: True\n",
            "- related_sub_factor: C6 (Deliberation)\n",
            "- reason: 如果被试回答他们有一些方法和技巧可以帮助自己快速进入工作状态，并提到了一些策略来提高工作效率，那就意味着他们在C6（审慎性）方面得分较高。这表明他们在行动之前会仔细考虑，并具有较高的决策思考能力。\n",
            "\n",
            "以上是对你设计的问题在评估被试conscientiousness程度方面的结果。请注意，这只是根据问题的可能回答进行的初步评估，实际的评估结果还需要结合更多的信息和综合判断。\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "response_in_chunk[27] = \"\"\"\n",
        "{\n",
        "  \"problem\": \"你平时是如何制定和实现目标的呢?有没有什么特别的方法可以提高目标的完成度呢?\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"C4 (Achievement Striving)\",\n",
        "  \"reason\": \"如果被试回答他们制定明确的目标并采取积极的行动来实现这些目标，那就意味着他们在C4（追求成就）方面得分较高。这表明他们有较高的抱负水平，并且努力工作以实现自己的目标。\"\n",
        "}\n",
        "{\n",
        "  \"problem\": \"你对追求卓越有什么独特的见解吗?怎样才能在生活中更进一步呢?\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"C4 (Achievement Striving)\",\n",
        "  \"reason\": \"如果被试回答他们对追求卓越有强烈的渴望，并提出了具体的方法和策略来在生活中取得更大的进步，那就意味着他们在C4（追求成就）方面得分较高。这表明他们有较高的抱负水平，并且致力于不断提高自己。\"\n",
        "}\n",
        "{\n",
        "  \"problem\": \"我有时挺拖延的。你平时如何规划时间、提高执行效率的呢?有什么技巧可以帮助自己更准时完成任务吗?\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"C5 (Self-Discipline)\",\n",
        "  \"reason\": \"如果被试回答他们采取了明确的时间规划和执行效率提高的方法，并提供了一些技巧来帮助自己更好地管理时间和准时完成任务，那就意味着他们在C5（自律性）方面得分较高。这表明他们有较强的自我约束能力和执行力。\"\n",
        "}\n",
        "{\n",
        "  \"problem\": \"我开始工作时也常需要先进入状态。你是如何快速投入工作的呢?有什么诀窍可以帮助自己更高效地开始工作吗?\",\n",
        "  \"if_plausible\": true,\n",
        "  \"related_sub_factor\": \"C6 (Deliberation)\",\n",
        "  \"reason\": \"如果被试回答他们有一些方法和技巧可以帮助自己快速进入工作状态，并提到了一些策略来提高工作效率，那就意味着他们在C6（审慎性）方面得分较高。这表明他们在行动之前会仔细考虑，并具有较高的决策思考能力。\"\n",
        "}\n",
        "\"\"\""
      ],
      "metadata": {
        "id": "lBrkT0KD-cCd"
      },
      "execution_count": 201,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "whole_message = prefix_prompt\n",
        "\n",
        "\n",
        "current_factor = long_name['O']\n",
        "\n",
        "problems = [x for x in datas if x['factor'] == current_factor]\n",
        "\n",
        "##TODO 在这里外部增加一个for循环\n",
        "\n",
        "target_message = f\"我正在设计一个心理学的实验，我希望通过访谈，去评估被试在大五人格中 {current_factor}的程度\"\n",
        "\n",
        "whole_message += '\\n'\n",
        "whole_message += target_message\n",
        "whole_message += '\\n'\n",
        "\n",
        "whole_message += big5_to_detail[current_factor]\n",
        "\n",
        "whole_message += suffix_prompt\n",
        "\n",
        "whole_message += \"我设计的问题如下:\\n\"\n",
        "\n",
        "#TODO : 改为从problems中4个一组抽取\n",
        "for prob_id in range(4):\n",
        "    whole_message += '- ' + problems[prob_id]['question']\n",
        "    whole_message += '\\n'\n",
        "\n",
        "\n",
        "messages = [\n",
        "    SystemMessage(content=smart_system_prompt),\n",
        "    HumanMessage(content=whole_message),\n",
        "]\n",
        "\n",
        "response = chat(messages)\n",
        "\n",
        "json_response = extract_json(response.content)\n",
        "\n",
        "我想在这段代码的前面增加一个for循环，把中间的prob_id变为4个一组抽取（现在只抽取了前4个）。如果最后一组不满4个则有多少抽取多少。请为我用python实现"
      ],
      "metadata": {
        "id": "VMFlysK8kmJY"
      }
    }
  ]
}